21 research outputs found
Towards Continual Reinforcement Learning for Quadruped Robots
Quadruped robots have emerged as an evolving technology that currently
leverages simulators to develop a robust controller capable of functioning in
the real-world without the need for further training. However, since it is
impossible to predict all possible real-world situations, our research explores
the possibility of enabling them to continue learning even after their
deployment. To this end, we designed two continual learning scenarios,
sequentially training the robot on different environments while simultaneously
evaluating its performance across all of them. Our approach sheds light on the
extent of both forward and backward skill transfer, as well as the degree to
which the robot might forget previously acquired skills. By addressing these
factors, we hope to enhance the adaptability and performance of quadruped
robots in real-world scenarios.Comment: 4 pages; Presented in the 3rd International Conference on Interactive
Media, Smart Systems and Emerging Technologies (IMET
Using Centroidal Voronoi Tessellations to Scale Up the Multi-dimensional Archive of Phenotypic Elites Algorithm
The recently introduced Multi-dimensional Archive of Phenotypic Elites
(MAP-Elites) is an evolutionary algorithm capable of producing a large archive
of diverse, high-performing solutions in a single run. It works by discretizing
a continuous feature space into unique regions according to the desired
discretization per dimension. While simple, this algorithm has a main drawback:
it cannot scale to high-dimensional feature spaces since the number of regions
increase exponentially with the number of dimensions. In this paper, we address
this limitation by introducing a simple extension of MAP-Elites that has a
constant, pre-defined number of regions irrespective of the dimensionality of
the feature space. Our main insight is that methods from computational geometry
could partition a high-dimensional space into well-spread geometric regions. In
particular, our algorithm uses a centroidal Voronoi tessellation (CVT) to
divide the feature space into a desired number of regions; it then places every
generated individual in its closest region, replacing a less fit one if the
region is already occupied. We demonstrate the effectiveness of the new
"CVT-MAP-Elites" algorithm in high-dimensional feature spaces through
comparisons against MAP-Elites in maze navigation and hexapod locomotion tasks
Black-Box Data-efficient Policy Search for Robotics
The most data-efficient algorithms for reinforcement learning (RL) in
robotics are based on uncertain dynamical models: after each episode, they
first learn a dynamical model of the robot, then they use an optimization
algorithm to find a policy that maximizes the expected return given the model
and its uncertainties. It is often believed that this optimization can be
tractable only if analytical, gradient-based algorithms are used; however,
these algorithms require using specific families of reward functions and
policies, which greatly limits the flexibility of the overall approach. In this
paper, we introduce a novel model-based RL algorithm, called Black-DROPS
(Black-box Data-efficient RObot Policy Search) that: (1) does not impose any
constraint on the reward function or the policy (they are treated as
black-boxes), (2) is as data-efficient as the state-of-the-art algorithm for
data-efficient RL in robotics, and (3) is as fast (or faster) than analytical
approaches when several cores are available. The key idea is to replace the
gradient-based optimization algorithm with a parallel, black-box algorithm that
takes into account the model uncertainties. We demonstrate the performance of
our new algorithm on two standard control benchmark problems (in simulation)
and a low-cost robotic manipulator (with a real robot).Comment: Accepted at the IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS) 2017; Code at
http://github.com/resibots/blackdrops; Video at http://youtu.be/kTEyYiIFGP
A comparison of illumination algorithms in unbounded spaces
International audienceIllumination algorithms are a new class of evolutionary algorithms capable of producing large archives of diverse and high-performing solutions. Examples of such algorithms include Novelty Search with Local Competition (NSLC), the Multi-dimensional Archive of Phenotypic Elites (MAP-Elites) and the newly introduced Cen-troidal Voronoi Tessellation (CVT) MAP-Elites. While NSLC can be used in unbounded behavioral spaces, MAP-Elites and CVT-MAP-Elites require the user to manually specify the bounds. In this study, we introduce variants of these algorithms that expand their bounds based on the discovered solutions. In addition, we introduce a novel algorithm called "Cluster-Elites" that can adapt its bounds to non-convex spaces. We compare all algorithms in a maze navigation problem and illustrate that Cluster-Elites and the expansive variants of MAP-Elites and CVT-MAP-Elites have comparable or better performance than NSLC, MAP-Elites and CVT-MAP-Elites
A survey on policy search algorithms for learning robot controllers in a handful of trials
Most policy search algorithms require thousands of training episodes to find
an effective policy, which is often infeasible with a physical robot. This
survey article focuses on the extreme other end of the spectrum: how can a
robot adapt with only a handful of trials (a dozen) and a few minutes? By
analogy with the word "big-data", we refer to this challenge as "micro-data
reinforcement learning". We show that a first strategy is to leverage prior
knowledge on the policy structure (e.g., dynamic movement primitives), on the
policy parameters (e.g., demonstrations), or on the dynamics (e.g.,
simulators). A second strategy is to create data-driven surrogate models of the
expected reward (e.g., Bayesian optimization) or the dynamical model (e.g.,
model-based policy search), so that the policy optimizer queries the model
instead of the real system. Overall, all successful micro-data algorithms
combine these two strategies by varying the kind of model and prior knowledge.
The current scientific challenges essentially revolve around scaling up to
complex robots (e.g., humanoids), designing generic priors, and optimizing the
computing time.Comment: 21 pages, 3 figures, 4 algorithms, accepted at IEEE Transactions on
Robotic
Comparing multimodal optimization and illumination
International audienceIllumination algorithms are a recent addition to the evolutionary computation toolbox that allows the generation of many diverse and high-performing solutions in a single run. Nevertheless, traditional multimodal optimization algorithms also search for diverse and high-performing solutions: could some multimodal optimization algorithms be beeer at illumination than illumination algorithms? In this study, we compare two illumination algorithms (Novelty Search with Local Competition (NSLC), MAP-Elites) with two multimodal optimization ones (Clearing, Restricted Tournament Selection) in a maze navigation task. e results show that Clearing can have comparable performance to MAP-Elites and NSLC
Safety-Aware Robot Damage Recovery Using Constrained Bayesian Optimization and Simulated Priors
International audienceThe recently introduced Intelligent Trial-and-Error (IT&E) algorithm showed that robots can adapt to damage in a matter of a few trials. The success of this algorithm relies on two components: prior knowledge acquired through simulation with an intact robot, and Bayesian optimization (BO) that operates on-line, on the damaged robot. While IT&E leads to fast damage recovery, it does not incorporate any safety constraints that prevent the robot from attempting harmful behaviors. In this work, we address this limitation by replacing the BO component with a constrained BO procedure. We evaluate our approach on a simulated damaged humanoid robot that needs to crawl as fast as possible, while performing as few unsafe trials as possible. We compare our new " safety-aware IT&E " algorithm to IT&E and a multi-objective version of IT&E in which the safety constraints are dealt as separate objectives. Our results show that our algorithm outperforms the other approaches, both in crawling speed within the safe regions and number of unsafe trials
Name Filter: A Countermeasure against Information Leakage Attacks in Named Data Networking
International audienceNamed Data Networking (NDN) has emerged as a future networking architecture having thepotential to replace the Internet. In order to do so, NDN needs to cope with inherent problems of the Internetsuch as attacks that cause information leakage from an enterprise. Since NDN has not yet been deployed ona large scale, it is currently unknown how such attacks can occur, let alone what countermeasures can betaken against them. In this study, we first show that information leakage in NDN, can be caused by malwareinside an enterprise, which uses steganography to produce malicious Interest names encoding confidentialinformation. We investigate such attacks by utilizing a content name dataset based on uniform resourcelocators (URLs) collected by a web crawler. Our main contribution is a name filter based on anomalydetection that takes the dataset as input and classifies a name in the Interest as legitimate or not. Ourevaluation shows that malware can exploit the path part in the URL-based NDN name to create maliciousnames, thus, information leakage in NDN cannot be prevented completely. However, we illustrate for thefirst time that our filter can dramatically choke the leakage throughput causing the malware to be 137 timesless efficient at leaking information. This finding opens up an interesting avenue of research that could resultin a safer future networking architecture
A survey on policy search algorithms for learning robot controllers in a handful of trials
International audienceMost policy search (PS) algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word “big-data,” we refer to this challenge as “micro-data reinforcement learning.” In this article, we show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based PS), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots, designing generic priors, and optimizing the computing time